Relative Depth for Behavior Based Recognition

نویسندگان

  • Ehud Rivlin
  • Liuqing Huang
چکیده

The problem of object recognition from sensory data is traditionally a problem of finding structure from an array of intensity functions. To solve the problem, a depth map is recovered to match against a set of possible structures to suggest the most likely objects. However, few schemes of this nature have been able to perform beyond laboratory assumptions. We propose to study object recognition b y asking the question in the context of an agent performing the recognition in an environment where the agent is performing a behavior. In our paradigm, the problem becomes a problem of action from intensity functions. In accomplishing a behavior, we are determining our next step of action from the images. Acquiring the information for action is a solution for a recognition task. The recognition task is agent and behavior dependent and can use the output of different visual modules. One possible visual module for the object recognition process can be a module that gives qualitative depth information. We discuss the way such a module can operate. We conclude that many recognition tasks become easy under this paradigm, as recognition is reduced to qualitative judgements of the scene. We proof our point of view with a set of real world examples based on visual information from relative depth visual module. 1 Behavior based recognition The problem of object recognition from sensory data is defined -in the literature as the association of visual input with a name or a symbol. Although very good research on the topic has been published, we still lack vision systems that can recognize in real time a large number of objects (natural or man-made). This is because in order to recognize an object one would have to visually recover it (i.e. its shape and various properties) and then match this recovered information against a database of known objects [2]. However, full recovery is hard to achieve to date, and matching suffers from combinatorial explosion. Model-based recognition on the other hand has been suggested as a remedy to these problems. By requiring that the objects to be recognized are specific instances of a generic model (e.g., polyhedral, generalized cylinders, cones, superquadrics, etc.), the problems of surface recovery and matching become easier. However, model-based approaches obviously employ strong assumptions about the nature of the scene and thus they lack generality. We proposed [3] to study the problem of object recognition by asking a different question, i.e. by considering it in the context of an agent performing it in an environment, where the agent’s intentions translate into a set of behaviors. What are objects for? An object can suit a purpose, fulfill a function. If the agent recognizes this, it has in effect recognized the object. To perform this type of recognition we need on one hand a definition of the desired function, and on the other the means of determining whether the object can fulfill that function. To find out if an object can fulfill a function we need to perform various partial recovery tasks. An agent is defined as a set of intentions, I I , I ~ , . . , I,. Each intention I k is translated into a set of behaviors, B k l , B k 2 , . . . , B k m . Each behavior B k i calls for the completion of recognition tasks T k i l , T k i 2 , . . . , T k i , . The agent acts in behavior B k i under intention 4. The behavior calls for the completion of recognition tasks T k i l , . . . , Tki,,. The behavior sets parameters for the recognition tasks. Note that the same object can answer positively to several recognition tasks. Under one behavior a chair will answer yes to some recognition task that is asking for obstacles, under another behavior it will answer yes to a recognition task that is asking for a sitting place, and under another it will answer yes to a task that is ask0-7803-0720-8/92 $3.00 0 1 9 9 2 IEEE ing for an assault weapon. We view the recognition process along the axis intention, behavior, recognition task. For a theory of purposive object recognition we should be able to make two basic transformations: first from the d e sired intention to the set of behaviors that achieve it, second from a specific behavior to some needed recognition task(s). In [3] we showed that the intentionto-behaviors problem with a finite number of behaviors is undecidable. We believe that a general automatic transformation from behaviors to recognition tasks is also hard. A possible solution is to use compiled knowledge and build a set of useful translations. 2 Recognition tasks under a specific behavior In order for us to build working systems, a natural direction will be to use a set of translations. It seems that a certain class of robots will share some common, useful, functional translations. We can define categories like animate, inanimate, prey, predator, obstacle, etc. that will belong to some hierarchical structure. The hierarchies are functional and have percep tual substance. They must have perceptual characteristics that make them discriminable. These functional relationships (here functional is been used in the utilitarian sense), can be translated, for example, into surface characteristics and geometric properties in a crude qualitative way. Each recognition task activates a different collection of basic perceptual modules. Each module finds a generic object property which is a result of one or a combination of direct low-level computations on some sensory data (possibly done by other modules). The result of a module’s operation is given as a qualitative value. Each module has its own neighboring open intervals which are parameter-specific. Such modules, for example, might provide information about the size of the object under consideration (very small, small, medium, large, very large) relative to the observer , various shape features of the object under consideration, etc. More complex modules might answer, for example, questions as is the object graspable? (based on low-level modules of size, shape, etc.), is the object a possible container? etc. A recognition task for a “cup concept” might be activated and performed in the following manner. A drinking intention could activate a searching behavior for “something that it is possible to drink from” This behavior calls for a recognition task with this definition as a source for the translation process to functional properties. This definition asks for a container that is open at one end, of reasonable size and graspable. These sub-tasks will be answered by the different visual modules. As another example, in a defense scenario we might have the intention of throwing an object at an attacker. This calls for a behavior of searching for “something that it is possible to throw and create damage”. This behavior will activate a recognition task for something that is rigid, mobile, graspable, and not too light or heavy. The following modules might be activated under this recognition task: Is it animate? mobile? graspable? hard? Note that the cup gives positive answers to all these questions. Under the defense intention the cup can be used as a missile. A cup and a stone will give the same values, and are equally good for the current intention and behavior. As a final example, a frog’s feeding intention calls for a prey-catching behavior which will include a recognition task for a “bug concept”. This task could make use of the following modules: Is it moving? Is the motion rapid (by the agent’s scale)? small (by the agent’s scale)? dark? reachable? 3 Integrating visual modules for recognition Under our framework an agent acts in behavior Bki under intention I k . The behavior calls for the completion of recognition tasks Tkil, . . . , T‘i,,. The behavior sets parameters for the recognition tasks. Each recognition task activates a different collection of basic perceptual modules. Each module finds a generic object property which is a result of one or a combination of direct low-level computations on some sensory data (possibly done by other modules). The result of a module’s operation is given as a qualitative value. Each module has its own neighboring open intervals which are parameter-specific. The ith module can take one of qil , . . . , qin qualitative values The state of our recognition system, denoted by Qi, is a tuple of all the qualitative values of our modules (q1, . . . , qm) under recognition task Tki,. Each recognition task Tkjj defines a system state that will ’See for a similar approach [4]. 2This, as well as the rigidity requirement for graspability, requires tactile sensory information. 3Such qualitative values represent a partial recovery of the scene.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of the Slipping Wear based on the Rate of Entropy Generation

Wear is a complicated phenomenon caused by the relative movement of two contacting surfaces compressed together by a normal force. Prediction of the wear, in most cases, requires various experiments and microstructural characterization of the contacting surfaces. Mathematical models based on physical concepts could provide considerable help in understanding the physical behavior and hence the p...

متن کامل

Investigation of the Slipping Wear based on the Rate of Entropy Generation

Wear is a complicated phenomenon caused by the relative movement of two contacting surfaces compressed together by a normal force. Prediction of the wear, in most cases, requires various experiments and microstructural characterization of the contacting surfaces. Mathematical models based on physical concepts could provide considerable help in understanding the physical behavior and hence the p...

متن کامل

Application of Combined Local Object Based Features and Cluster Fusion for the Behaviors Recognition and Detection of Abnormal Behaviors

In this paper, we propose a novel framework for behaviors recognition and detection of certain types of abnormal behaviors, capable of achieving high detection rates on a variety of real-life scenes. The new proposed approach here is a combination of the location based methods and the object based ones. First, a novel approach is formulated to use optical flow and binary motion video as the loc...

متن کامل

Uniform Local Binary Pattern Based Texture-Edge Feature for 3D Human Behavior Recognition

With the rapid development of 3D somatosensory technology, human behavior recognition has become an important research field. Human behavior feature analysis has evolved from traditional 2D features to 3D features. In order to improve the performance of human activity recognition, a human behavior recognition method is proposed, which is based on a hybrid texture-edge local pattern coding featu...

متن کامل

Weighted joint-based human behavior recognition algorithm using only depth information for low-cost intelligent video-surveillance system

Recent advances in 3D depth sensors have created many opportunities for security, surveillance, and entertainment. The 3D depth sensors provide more powerful monitoring systems for dangerous situations irrespective of lighting conditions in buildings or production facilities. To robustly recognize emergency actions or hazardous situations of workers at a production facility, we present human jo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004